Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Adaptive Load Balancing for Parameter Servers in Distributed Machine Learning over Heterogeneous Networks
CAI Weibo, YANG Shulin, SUN Gang, ZHANG Qiming, YU Hongfang
ZTE Communications    2023, 21 (1): 72-80.   DOI: 10.12142/ZTECOM.202301009
Abstract5)   HTML0)    PDF (1061KB)(3)       Save

In distributed machine learning (DML) based on the parameter server (PS) architecture, unbalanced communication load distribution of PSs will lead to a significant slowdown of model synchronization in heterogeneous networks due to low utilization of bandwidth. To address this problem, a network-aware adaptive PS load distribution scheme is proposed, which accelerates model synchronization by proactively adjusting the communication load on PSs according to network states. We evaluate the proposed scheme on MXNet, known as a real-world distributed training platform, and results show that our scheme achieves up to 2.68 times speed-up of model training in the dynamic and heterogeneous network environment.

Table and Figures | Reference | Related Articles | Metrics